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Difficulties  arising  in  adapting  and  using  traditional  high-level  languages  to  program 
parallel  and  distributed  computations  have  increased  interest  in  alternative  program¬ 
ming  paradigms,  such  as  dataflow,  assertive,  reduction,  logic,  e,  .  In  many  of  these 
paradigms  proposed  languages  belong  to  the  larger  class  of  functional  languages. 

A  functional  program  defines  a  set  of  functions.  An  execution  of  a  functional  pro¬ 
gram  can  be  viewed  as  an  application  of  a  function  to  a  set  of  values  of  its  parameters. 
Since  there  is  no  notion  of  the  program  state,  side  effects  of  conventional  program¬ 
ming  languages  are  completely  absent  in  the  functional  programs.  Consequently,  con¬ 
current  execution  of  multiple  functions  is  permitted.  The  usefulness  of  the  functional 
languages  in  parallel  programming  can  be  further  advanced  if  the  referential  tran¬ 
sparency  is  supported  by  allowing  only  definitions  and  not  assignments  in  the  program. 
The  referential  transparency  simplifies  compile-time  program  transformation,  data 
dependence  analysis  and  parallelization. 

In  one  of  the  novel  approaches  to  parallel  programming,  called  assertive  para¬ 
digm,  computations  are  specified  as  sets  of  assertions  about  properties  of  the  solution, 
and  not  as  sequences  of  procedural  steps.  Procedural  solutions  are  automatically  gen¬ 
erated  from  the  assertive  description.  Programmers  are  not  involved  in  the  detailed 
implementation,  as  efficiency  and  correctness  are  assured  by  the  underlying  language 
translator. 

Depending  on  the  type  of  assertions  that  are  used  as  a  basis  for  a  notation, 
different  languages  for  assertive  programming  have  been  proposed.  Perhaps  the  best 
known  is  logic  programming  with  the  Prolog  language  as  its  prime  example.  In  Prolog, 
assertions  are  expressed  as  Horn  clauses.  Automatic  inference  of  new  facts  from  the 
given  rules  and  known  facts  makes  it  a  convenient  programming  tool  for  artificial 
intelligence  and  expert  systems.  However,  in  applications  in  which  numerical  compu¬ 
tations  are  involved,  Prolog  usefulness  is  questionable  because  of  the  inconvenience  of 
expressing  numerical  algorithms  in  that  language. 

Another  notation  for  assertive  programming  was  proposed  in  equational 
languages,  where  assertions  are  expressed  as  algebraic  equations.  Programs  written  in 
equational  languages  are  concise,  free  from  implementation  details,  and  easily  amen¬ 
able  to  verification  and  parallel  processing.  Those  programs,  however,  require  a  sophis¬ 
ticated  translator  to  generate  efficient  object  code.  It  is  necessary  to  use  global  analysis 


and  heuristic  program  transformations  to  achieve  a  quality  translation.  The  envisaged 
role  of  the  computer  is  not  to  execute,  step  by  step,  prescribed  operations,  like  in  pro¬ 
cedural  programming,  but  to  find  such  values  of  unknown  variables,  that  all  stated 
assertions  become  true. 

A  modem  scientific  and  engineering  computation  language  should  satisfy  several 
additional  postulates  arising  from  the  plethora  of  architectures  on  which  such  computa¬ 
tions  can  be  executed.  The  most  important  postulate  is  to  separate  the  issue  of  execu¬ 
tion  from  the  meaning  of  the  computation.  In  other  words,  the  language  should  enable 
the  programmer  to  separate  ‘what’  from  ‘how’;  the  description  of  the  meaning  of  the 
computation  should  be  separated  from  the  statements  (if  any)  directing  the  compiler  in 
translating  the  computation  for  execution  on  any  particular  architecture.  Such  a  feature 
would  also  contribute  to  rapid  prototyping  and  increased  portability  of  the  code. 

In  defining  any  large-scale  computation,  proper  facilities  for  problem  decomposi¬ 
tions  are  of  the  utmost  importance.  The  description  of  the  computation  in  each  of  the 
decomposed  subtasks  should  be  separate  from  the  description  of  the  interactions 
between  those  subtasks.  In  conventional  programming,  the  first  description  roughly 
corresponds  to  the  programming-in-the-small,  and  the  second  one  to  the 
programming-in-the-large.  The  language  for  large-scale  scientific  and  engineering  pro¬ 
gramming  should  provide  means  for  keeping  these  descriptions  independent. 

An  assignment  statement,  the  cornerstone  of  any  conventional  programming 
language,  is  the  main  source  of  difficulty  in  parallel  programming,  since  it  changes  the 
value  of  the  variable  on  the  left-hand  side  of  the  assignment  Thus,  a  value  of  any 
variable  at  a  certain  point  of  the  program  execution  is  defined  by  the  last  executed 
assignment  statement  for  this  variable.  Consequently,  execution  of  any  procedure  that 
uses  global  variables  is  affected  not  only  by  the  values  of  its  parameters  but  also  by 
the  assignments  to  these  global  variables.  Likewise,  effects  of  the  procedure  execution 
may  include  changes  to  the  values  of  the  global  variables  that  have  assignment  state¬ 
ments  in  the  procedure  body.  If  there  are  parallel  execution  paths,  each  containing 
assignment  statements  for  a  variable,  then  the  final  value  of  this  variable  may  depend 
on  a  relative  speed  with  which  those  paths  are  executed.  To  avoid  such  side  effects  of 
the  assignment  statement,  many  parallel  functional  languages  enforce  the  single  assign¬ 
ment  rule  which  states  that  each  variable  can  have  only  one  value  and  prior  to  the 
assignment  of  a  value,  the  variable  is  undefined. 


The  single  assignment  rule  leads  to  declaring  and  operating  on  structures  that 
have  ‘excessive’  dimensionality.  After  all,  a  variable  which  would  be  merely  reas¬ 
signed  in  a  traditional  language,  in  the  presence  of  a  single  assignment  rule,  has  to  be 
viewed  as  a  vector  of  values  and  each  reassignment  has  to  be  indexed  by  a  different 
subscript.  However,  the  multidimensional  view  of  such  a  variable  is  logical  only.  The 
optimizing  compiler  should  easily  be  able  to  eliminate  such  an  excessive  dimension  in 
the  variable  implementation.  If  there  are  several  candidate  dimensions  for  elimination, 
the  compiler  can  select  an  elimination  by  analyzing  all  data  dependences  in  a  computa¬ 
tion.  Thus,  it  is  reasonable  to  expect  that  such  a  selection  will  be  at  least  as  good  as 
the  selection  made  by  the  programmer  who  typically  bases  the  decision  more  on 
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intuition  and  meaning  of  the  variable  than  on  the  implementation  efficiency. 

Finally,  to  enable  a  language  compiler  to  explore  parallelism  at  its  lowest  level, 
the  language  should  also  provide  operators  which  can  be  applied  to  components  of  the 
arguments  in  a  dataflow  fashion,  i.e.,  in  the  order  that  those  components  become  avail¬ 
able,  and  not  in  the  order  of  their  declaration. 

Equational  languages  are  naturally  suited  to  mathematical  modeling.  They  are 
convenient  to  describe  computations  that  involve  solving  systems  of  linear  equations 
that  may  arise  directly  (as,  e.g.,  in  econometric  modeling  or  as  the  result  of  a  discrete 
approximation  of  a  system  of  differential  equations.  Various  numerical  aspects  of  the 
solution,  such  as  the  applied  method,  initial  values  or  convergence  criteria,  can  be 
either  generated  automatically  by  default  or  may  be  provided  by  the  programmer. 
Equational  languages  have  also  been  proven  to  be  an  effective  tool  for  describing  gen¬ 
eral  computational  tasks. 

Equational  Programming  Language,  abbreviated  EPL,  is  a  simple  non-strict  func¬ 
tional  language  with  type  inference  designed  for  programming  parallel  and  distributed 
computation.  The  language  is  defined  in  terms  of  just  a  few  constructs:  generalized 
arrays  and  subscripts  for  data  structures,  recurrent  equations  for  data  value  definitions, 
ports  for  process  interactions  and  virtual  processors  for  execution  directives.  An  EPL 
program  consists  of  data  declarations  and  annotated  conditional  equations.  Equations 
are  defined  over  multidimensional  jagged-edge  arrays  and  may  be  annotated  by  virtual 
processors  on  which  they  are  executed.  Data  declarations  are  annotated  by  the  record 
and  port  designators  which  are  used  to  identify  interfaces  with  an  external  environment 
and  other  programs. 

In  addition  to  programs,  the  EPL  user  can  define  configurations  which  describe 
interconnections  between  ports  of  different  processes.  Configurations  allow  the  pro¬ 
grammer  to  reuse  the  same  EPL  programs  in  different  computations.  They  also  facili¬ 
tate  computation  decomposition.  A  port  creates  a  fair  merge  of  its  input  sequences  and 
hence  enables  an  easy  expression  of  non-determinism  without  changing  the  functional 
character  of  a  program  definition. 

In  addition  to  single-valued  data  structures,  EPL  programs  contain  subscripts  that 
assume  a  range  of  integers  as  their  values.  Subscripts  give  EPL  a  dual  flavor.  In  the 
definitional  view  they  may  be  treated  as  universal  quantifiers  and  equations  are  then 
viewed  as  logical  predicates.  In  the  operational  view  they  can  be  seen  as  loop  control 
variables,  and  each  equation  then  is  seen  as  a  statement  nested  in  loops  implied  by  its 
subscripts.  A  more  detailed  description  of  the  language  may  be  found  elsewhere. 

The  basic  techniques  used  in  the  compilation  of  EPL  programs  are  data  depen¬ 
dence  analysis  and  data  attributes  propagation.  In  a  single  program,  the  dependencies 
are  represented  in  a  compact  form  by  the  conditional  array  graph.  This  graph  associ¬ 
ates  each  dependence  with  its  attributes,  such  as  the  distance  between  dependent  ele¬ 
ments,  conditions  under  which  dependence  holds,  the  subscripts  associated  with  it,  etc. 
Both  explicit  data  dependencies  (defined  by  the  usage  of  one  data  structure  in  an  equa¬ 
tion  defining  the  other)  and  implicit  data  dependencies  (implied,  for  example,  by  the 
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sequentiality  of  the  reading  of  incoming  messages)  are  represented  as  various  kinds  of 
edges  in  the  conditional  array  graph.  Each  node  of  this  graph  contains  information 
about  the  represented  entity,  such  as  the  number  and  ranges  of  its  dimensions,  its  type 
and  class,  and  conditions  guarding  its  definitions. 

The  correctness  of  the  program  is  checked  by  verifying  the  consistency  of  the 
different  attributes  of  data  structures  and  data  dependencies.  To  accomplish  this,  the 
EPL  compiler  propagates  data  and  dependence  attributes  along  the  edges  of  the  graph. 

A  similar  dependence  graph  is  also  created  for  a  configuration.  It  shows  the  data 
dependences  among  the  processes  of  the  computation  and  is  used  in  scheduling 
processes  and  mapping  them  onto  the  processors. 

The  extent  of  transformation  required  to  generate  the  object  code  depends  on  the 
architecture  at  hand.  Similarly  as  for  LUCID  that  is  presented  in  the  second  chapter, 
EPL  programs  can  be  almost  directly  executed  on  a  specialized  tagged  dataflow  archi¬ 
tecture  that  consists  of  the  following  functional  elements: 

token  memory:  A  memory  in  which  each  tagged  value  (subscripted  variable)  is 

stored  after  it  has  been  read  in  or  evaluated. 

matching  unit:  A  unit  that  releases  an  equation  instance  for  execution  when  all 

data  values  needed  for  that  equation  evaluation  are  present  in  the  token  memory. 

executing  unit:  One  of  a  number  of  arithmetic  units  able  to  evaluate  EPL  opera¬ 
tors. 

The  conditional  array  graph  defines  the  number  of  copies  of  a  value  needed  in  the 
token  memory  for  each  evaluated  subscripted  variable.  One  copy  is  needed  for  each 
edge  outgoing  from  the  node  representing  the  corresponding  variable.  Each  process 
may  have  a  separate  dataflow  machine  assigned  for  its  execution  and  these  machines 
have  to  be  connected  to  exchange  data  through  process  ports.  Alternatively,  one 
dataflow  machine  may  be  allocated  to  the  entire  computation,  and  then  ports  would 
merely  define  equivalences  of  variables  from  different  processes.  With  a  sufficient 
number  of  arithmetic  units,  the  dataflow  implementation  can  provide  the  highest  paral¬ 
lelism.  However,  eager  scheduling  of  EPL  computations  can  easily  lead  to  an  exces¬ 
sive  demand  for  the  token  memory. 

In  the  well-known  Flynn’s  classification  of  parallel  computational  models,  the  von 
Neumann  model  is  characterized  by  a  Single  stream  of  Instructions  controlling  a  Sin¬ 
gle  stream  of  Data  (SISD).  To  achieve  parallelism,  Multiple  Data  streams  have  been 
introduced  creating  (SIMD)  model.  Further  extension  is  to  add  Multiple  Instruction 
streams  and  this  extension  leads  to  (MIMD)  architectures.  The  last  category  can  be 
conveniently  split  on  the  basis  of  a  data  access  mechanism  into  shared  and  distributed 
memory  architectures.  In  the  shared-memory  architectures  processors  have  an  equal 
access  to  one  global  memory.  In  the  distributed-memoiy  architectures,  each  processor 
has  a  direct  access  to  its  local  memory  and  indirect  access  to  the  memory  of  other  pro¬ 
cessors.  The  indirect  access  is  typically  supported  through  message  passing  mechanism 
that  enables  processors  to  communicate  with  each  other. 
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For  a  SIMD  machine,  such  as  the  Connection  Machine,  the  major  task  of  the  EPL 
translation  is  to  identify  an  EPL  subscript  that  will  index  individual  processors  (i.e.,  a 
subscript  that  defines  a  domain  of  the  computation).  Equations  indexed  by  the  domain 
subscript  will  be  executed  in  parallel  on  different  processors.  The  selection  of  the 
domain  subscript  is  influenced  by  the  fact  that  a  reference  to  an  indexing  expression 
different  than  a  domain  subscript  implies  communication  of  data  from  another  proces¬ 
sor. 

Even  more  involving  is  the  translation  for  Multiple  Instruction  Multiple  Data 
(MIMID)  machines.  For  shared-memory  architectures  in  this  class,  only  the  placement 
of  equations  is  an  issue.  The  ports  can  be  easily  and  efficiently  implemented  as  blocks 
of  shared-memory.  The  efficient  translation  requires  strong  memory  optimization  to 
counterweight  the  effects  of  ‘excessive’  dimensions  present  due  to  the  single  assign¬ 
ment  rule  of  the  language. 

For  distributed-memory  machines  the  additional  difficulty  arises  from  data  place¬ 
ment.  In  the  current  implementation  of  the  EPL  code  generator  for  the  Intel  hypercube, 
it  is  assumed  that  data  are  distributed  together  with  the  equations  that  define  them. 
This  assumption  makes  the  optimal  allocation  of  equations  to  processors  a  more  com¬ 
plex  task. 

The  more  detailed  discussion  and  documentation  of  the  work  done  in  this  project 
is  presented  in  the  cited  below  nine  papers  and  six  Technical  Reports  report  prepared 
in  connection  with  the  research  performed  under  the  reported  contract. 

The  general  description  of  the  language,  justification  of  the  design  and  the  outline 
of  the  implementation  has  been  presented  in  [2,4].  The  details  of  the  implementation  of 
the  EPL  compiler  for  a  single  process  specification  has  been  documented  in  [tr2,  tr3]. 
In  real-time  application  with  time-constraints  and  also  in  load  balancing  of  parallel 
computation,  the  reliable  estimates  of  the  computational  delays  incurred  in  each  parti¬ 
cipating  process  are  of  outmost  importance.  In  [trl]  we  presented  a  technique  and  a 
software  tool  for  finding  such  estimates  for  programs  specified  in  a  definitional 
language  (MODEL  or  EPL).  Program  integration  problem  is  address  in  the  EPL  sys¬ 
tem  by  a  configurator.  The  configurator  implementation  has  been  documented  in  [tr4], 
and  reported  in  [9,8].  Particularly  interesting  environment  for  integration  of  real-time 
programs  arises  in  ADA,  and  problems  and  solutions  of  this  problem  in  the  framework 
of  definitional  programming  have  been  discussed  in  [6]. 

One  of  the  new  software  tools  developed  for  processing  EPL  specifications  was 
conditional  data  dependence  analyzer  that  was  documented  in  [tr5]  and  reported  in 
[5,7].  Our  implementation  of  EPL  code  generator  for  shared  memory  Sequent  Balance 
21000  parallel  computer  employs  mutual  exclusion.  We  have  developed  a  new,  robust 
software  solution  to  this  problem,  that  requires  small  numbers  of  shared  variables  [8]. 

The  annotations  have  been  introduced  in  EPL  as  a  means  of  a  semanticly  clean 
expression  of  user  directives  for  partitioning  and  mapping  of  computation  for  parallel 
processing.  The  annotations  are  translated  into  purely  functional  EPL  programs  by  the 
preprocessor  documented  in  [tr6].  Finally,  our  initial  results  of  investigation  into 


-  7  - 


parallel  fine-grain  EPL  scheduling  has  been  presented  in  [3]. 
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